Aide-Memoire. High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality

نویسنده

  • David L. Donoho
چکیده

The coming century is surely the century of data. A combination of blind faith and serious purpose makes our society invest massively in the collection and processing of data of all kinds, on scales unimaginable until recently. Hyperspectral Imagery, Internet Portals, Financial tick-by-tick data, and DNA Microarrays are just a few of the betterknown sources, feeding data in torrential streams into scientific and business databases worldwide. In traditional statistical data analysis, we think of observations of instances of particular phenomena (e.g. instance ↔ human being), these observations being a vector of values we measured on several variables (e.g. blood pressure, weight, height, ...). In traditional statistical methodology, we assumed many observations and a few, wellchosen variables. The trend today is towards more observations but even more so, to radically larger numbers of variables – voracious, automatic, systematic collection of hyper-informative detail about each observed instance. We are seeing examples where the observations gathered on individual instances are curves, or spectra, or images, or even movies, so that a single observation has dimensions in the thousands or billions, while there are only tens or hundreds of instances available for study. Classical methods are simply not designed to cope with this kind of explosive growth of dimensionality of the observation vector. We can say with complete confidence that in the coming century, high-dimensional data analysis will be a very significant activity, and completely new methods of high-dimensional data analysis will be developed; we just don’t know what they are yet. Mathematicians are ideally prepared for appreciating the abstract issues involved in finding patterns in such high-dimensional data. Two of the most influential principles in the coming century will be principles originally discovered and cultivated by mathematicians: the blessings of dimensionality and the curse of dimensionality. The curse of dimensionality is a phrase used by several subfields in the mathematical sciences; I use it here to refer to the apparent intractability of systematically searching through a high-dimensional space, the apparent intractability of accurately approximating a general high-dimensional function, the apparent intractability of integrating a high-dimensional function. The blessings of dimensionality are less widely noted, but they include the concentration of measure phenomenon (so-called in the geometry of Banach spaces), which means that certain random fluctuations are very well controlled in high dimensions and the success of asymptotic methods, used widely in mathematical statistics and statistical physics, which suggest that statements about very high-dimensional settings may be made where moderate dimensions would be too complicated.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality

The coming century is surely the century of data. A combination of blind faith and serious purpose makes our society invest massively in the collection and processing of data of all kinds, on scales unimaginable until recently. Hyperspectral Imagery, Internet Portals, Financial tick-by-tick data, and DNA Microarrays are just a few of the betterknown sources, feeding data in torrential streams i...

متن کامل

A Monte Carlo-Based Search Strategy for Dimensionality Reduction in Performance Tuning Parameters

Redundant and irrelevant features in high dimensional data increase the complexity in underlying mathematical models. It is necessary to conduct pre-processing steps that search for the most relevant features in order to reduce the dimensionality of the data. This study made use of a meta-heuristic search approach which uses lightweight random simulations to balance between the exploitation of ...

متن کامل

Educational psychology in medical learning: a randomised controlled trial of two aide memoires for the recall of causes of electromechanical dissociation.

OBJECTIVES Although mnemonics are commonly used in medical education there are few data on their effectiveness. A RCT was undertaken to test the hypothesis that a new aide memoire, "EMD-aide", would be superior to the conventional "4Hs+4Ts" mnemonic in facilitating recall of causes of electromechanical dissociation (EMD) among house officers. METHOD "EMD-aide", organises causes of EMD by freq...

متن کامل

2D Dimensionality Reduction Methods without Loss

In this paper, several two-dimensional extensions of principal component analysis (PCA) and linear discriminant analysis (LDA) techniques has been applied in a lossless dimensionality reduction framework, for face recognition application. In this framework, the benefits of dimensionality reduction were used to improve the performance of its predictive model, which was a support vector machine (...

متن کامل

Dimensionality analysis of subsurface structures in magnetotellurics using different methods (a case study: oil field in Southwest of Iran)

Magnetotelluric (MT) method is an electromagnetic technique that uses the earth natural field to map the electrical resistivity changes in subsurface structures. Because of the high penetration depth of the electromagnetic fields in this method (tens of meters to tens of kilometers), the MT data is used to investigate the shallow to deep subsurface geoelectrical structures and their dimensions....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000